AITopics | source policy

Transfer in reinforcement learning aims at solving a new target task with no additional learning or sample-efficiently by exploiting agents and information obtained from source tasks. We review a line of research with relevant approaches. This group of approaches reuses policies learned on source tasks for target tasks. Fernández and Veloso [17] suggest an exploration strategy for the learning of a new policy given a new task and learned source policies, where the gain of using each policy is estimated together on-line and one of the policies in the set is selected probabilistically at each step, based on the gain, but they focus on aiding the training of the target policy with samples from the target task rather than improving the zero-shot transfer performance. On the other hand, Dayan [14] introduce successor representations (SRs), state space occupancy representations disentangled from rewards, which allow linear decomposition of value functions.

large language model, machine learning, target task, (21 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

b09df3a10e26204136540ca59bc5a646-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 09:37:15 GMT

algorithm, source policy, target policy, (11 more...)

Neural Information Processing Systems

Country: Asia > China > Heilongjiang Province > Harbin (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

90610aa0e24f63ec6d2637e06f9b9af2-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 21:45:07 GMT

policy evaluation, reinforcement learning, successor feature, (14 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

CUP: Critic-Guided Policy Reuse

Neural Information Processing SystemsDec-25-2025, 00:36:52 GMT

The ability to reuse previous policies is an important aspect of human intelligence. To achieve efficient policy reuse, a Deep Reinforcement Learning (DRL) agent needs to decide when to reuse and which source policies to reuse. Previous methods solve this problem by introducing extra components to the underlying algorithm, such as hierarchical high-level policies over source policies, or estimations of source policies' value functions on the target task. However, training these components induces either optimization non-stationarity or heavy sampling cost, significantly impairing the effectiveness of transfer. To tackle this problem, we propose a novel policy reuse algorithm called Critic-gUided Policy reuse (CUP), which avoids training any extra components and efficiently reuses source policies.

critic-guided policy reuse, name change, source policy, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Checklist 1. For all authors (a)

Neural Information Processing SystemsAug-22-2025, 01:18:47 GMT

Do the main claims made in the abstract and introduction accurately reflect the paper's Did you discuss any potential negative societal impacts of your work? Did you state the full set of assumptions of all theoretical results? Did you include complete proofs of all theoretical results? Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] See the Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [N/A] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) We believe policy reuse serves as a promising way to transfer knowledge among AI agents.

artificial intelligence, source policy, tar, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Social Sector (0.34)

Technology: Information Technology > Artificial Intelligence (0.48)

Add feedback

CUP: Critic-Guided Policy Reuse

Neural Information Processing SystemsAug-22-2025, 01:18:43 GMT

The ability to reuse previous policies is an important aspect of human intelligence.

machine learning, reinforcement learning, source policy, (12 more...)

Neural Information Processing Systems

Country: Asia > China > Heilongjiang Province > Harbin (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Risk-Aware Transfer in Reinforcement Learning using Successor Features

Neural Information Processing SystemsAug-15-2025, 23:27:51 GMT

However, the problem of transferring skills in a risk-aware manner is not well-understood.

policy evaluation, reinforcement learning, successor feature, (12 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

FAST: Similarity-based Knowledge Transfer for Efficient Policy Learning

Capurso, Alessandro, Piccoli, Elia, Bacciu, Davide

arXiv.org Artificial IntelligenceJul-29-2025

--Transfer Learning (TL) offers the potential to accelerate learning by transferring knowledge across tasks. However, it faces critical challenges such as negative transfer, domain adaptation and inefficiency in selecting solid source policies. These issues often represent critical problems in evolving domains, i.e. game development, where scenarios transform and agents must adapt. The continuous release of new agents is costly and inefficient. In this work we challenge the key issues in TL to improve knowledge transfer, agents performance across tasks and reduce computational costs. The proposed methodology, called F AST - Framework for Adaptive Similarity-based Transfer, leverages visual frames and textual descriptions to create a latent representation of tasks dynamics, that is exploited to estimate similarity between environments. The similarity scores guides our method in choosing candidate policies from which transfer abilities to simplify learning of novel tasks. Experimental results, over multiple racing tracks, demonstrate that F AST achieves competitive final performance compared to learning-from-scratch methods while requiring significantly less training steps. Learning is often thought of as a process rooted in interactions with the environment. Reinforcement Learning (RL) expands on this core concept by viewing learning as a trial-and error process, in which agents engage with the environment, make choices, and receive feedback in the form of reward or penalties. Traditionally, agents are trained from scratch to accomplish a single task, requiring extensive interactions with the environment to achieve proficiency far more than a human would need for comparable tasks. One primary challenge in RL is the substantial computational demands imposed by simulation, where training time and data requirements scale up for complex tasks. In game development and other evolving environments it is expensive and sub-optimal to start at each iteration from zero.

artificial intelligence, machine learning, vehicle, (18 more...)

arXiv.org Artificial Intelligence

2507.20433

Country: Europe (0.28)

Genre: Research Report (0.82)

Industry:

Education (0.93)
Leisure & Entertainment > Games > Computer Games (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks

Guo, Yijie, Tang, Bingjie, Akinola, Iretiayo, Fox, Dieter, Gupta, Abhishek, Narang, Yashraj

arXiv.org Artificial IntelligenceMar-6-2025

Enabling robots to learn novel tasks in a data-efficient manner is a long-standing challenge. Common strategies involve carefully leveraging prior experiences, especially transition data collected on related tasks. Although much progress has been made for general pick-and-place manipulation, far fewer studies have investigated contact-rich assembly tasks, where precise control is essential. We introduce SRSA (Skill Retrieval and Skill Adaptation), a novel framework designed to address this problem by utilizing a pre-existing skill library containing policies for diverse assembly tasks. The challenge lies in identifying which skill from the library is most relevant for fine-tuning on a new task. Our key hypothesis is that skills showing higher zero-shot success rates on a new task are better suited for rapid and effective fine-tuning on that task. To this end, we propose to predict the transfer success for all skills in the skill library on a novel task, and then use this prediction to guide the skill retrieval process. We establish a framework that jointly captures features of object geometry, physical dynamics, and expert actions to represent the tasks, allowing us to efficiently learn the transfer success predictor. Extensive experiments demonstrate that SRSA significantly outperforms the leading baseline. When retrieving and fine-tuning skills on unseen tasks, SRSA achieves a 19% relative improvement in success rate, exhibits 2.6x lower standard deviation across random seeds, and requires 2.4x fewer transition samples to reach a satisfactory success rate, compared to the baseline. Furthermore, policies trained with SRSA in simulation achieve a 90% mean success rate when deployed in the real world. Please visit our project webpage https://srsa2024.github.io/.

assembly task, skill library, target task, (13 more...)

arXiv.org Artificial Intelligence

2503.04538

Country:

North America > United States > California (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > Montserrat (0.04)

Genre: Research Report (0.63)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

CUP: Critic-Guided Policy Reuse

Neural Information Processing SystemsJan-18-2025, 13:55:49 GMT

The ability to reuse previous policies is an important aspect of human intelligence. To achieve efficient policy reuse, a Deep Reinforcement Learning (DRL) agent needs to decide when to reuse and which source policies to reuse. Previous methods solve this problem by introducing extra components to the underlying algorithm, such as hierarchical high-level policies over source policies, or estimations of source policies' value functions on the target task. However, training these components induces either optimization non-stationarity or heavy sampling cost, significantly impairing the effectiveness of transfer. To tackle this problem, we propose a novel policy reuse algorithm called Critic-gUided Policy reuse (CUP), which avoids training any extra components and efficiently reuses source policies.

critic-guided policy reuse, guidance policy, source policy, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)

Add feedback

Filters

Collaborating Authors

source policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ARelated Work

b09df3a10e26204136540ca59bc5a646-Paper-Conference.pdf

90610aa0e24f63ec6d2637e06f9b9af2-Paper.pdf

CUP: Critic-Guided Policy Reuse

Checklist 1. For all authors (a)

CUP: Critic-Guided Policy Reuse

Risk-Aware Transfer in Reinforcement Learning using Successor Features

FAST: Similarity-based Knowledge Transfer for Efficient Policy Learning

SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks

CUP: Critic-Guided Policy Reuse